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Abstract 

For a probability measure /i on a real separable Hilbert space H, we are interested in "volume- 
based" approximations of the (i-dimensional least squares error of [a, i.e., least squares error 
with respect to a best fit d-dimensional affine subspace. Such approximations are given by 
averaging real-valued multivariate functions which are typically scalings of squared (d + 1)- 
volumes of (d+l)-simplices in H. Specifically, we show that such averages are comparable to the 
square of the d-dimensional least squares error of fx, where the comparison depends on a simple 
quantitative geometric property of fi. This result is a higher dimensional generalization of the 
elementary fact that the double integral of the squared distances between points is proportional 
to the variance of fi. We relate our work to two recent algorithms, one for clustering affine 
subspaces and the other for Monte-Carlo SVD based on volume sampling. 

1 Introduction 

Our setting includes a real separable Hilbert space H (with dot product (•, •) and induced norm 
|| • ||), a Borel probability measure \x on H and a fixed intrinsic dimension d € N. We assume that 
the support of \i is bounded. Let AG^(-ff) denote the affine Grassmannian on H, that is, the set 
of all (i-flats (i.e., d-dimensional affine subspaces) in H. The d-dimensional least squares (LS) error 
for fj, is 



e2(n,d) = inf W / dist 2 (a;, L) d^(x), (1) 
L€AG d (H) y J 

where dist(x, L) denotes the distance of x G H to L. 

We form functions c : H d+2 — > R, whose integrals approximate e^in, d). Denoting an arbitrary 
element of H d+2 by X = (xq, . . . ,Xd+i) and viewing it as a (d + l)-simplex in H, we express the 
desired comparison as follows. 

[ c 2 (X)d l i d+2 (X) (2) 

JH d + 2 
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(i.e., the ratios of the LHS and RHS of ([2]) are bounded by constants, which may depend on fi). 
Some of these functions are obtained by appropriate scaling of (d + l)-volumes. We denote by 
Md+i(X) the (d + l)-volume of any of the parallelotopes generated by the vertices of X. We also 
denote the diameter of X by diam(X), i.e., the maximal edge length. An example of such a function 
c is obtained by scaling M^+i(X) by a power of the diameter, i.e. 

cUX) = Md+1 } X) ■ (3) 

We refer to such functions as geometric condition numbers (GCNs), since they measure the geo- 
metric conditioning of the simplex X by a quantity that scales like the diameter of the simplex. 
The smaller they are the flatter, i.e., better-conditioned, the simplex is. 

When d = 0, ([2]) reduces to an elementary though useful identity, which we exemplify for the 
GCN c vo i. In this case, the best approximating 0-flat (i.e., best approximating point) is the mean, 
J xdfi, and e 2 ^//, 0) is the variance of that is, 



4M 

Moreover, 

and consequently, 
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x — I x dfi(x 

c vo i(x 1 ,x 2 ) = \\xi - x 2 



dfi(x) = - I \\xi - x 2 || 2 d/i(xi) dfi(x 2 ) 



e l(^^) = \ j c vo i 2 (xi,x 2 )d^(xi)d/i(x 2 ). (4) 

Since our GCNs (of d + 2 variables) are constant multiples of the pairwise distance when d = 0, 
this identity extends to all of them (with possibly a different multiplicative constant). 

This paper generalizes (jl]) to higher dimensional approximations and obtains estimates like ([2]) 
for various GCNs. This generalization restricts the type of measure fi by various conditions (de- 
pending on the GCN). Our weakest condition, which we refer to as G?-separation tries to avoid 
the concentration of fj, around a subspace of dimension lower than d (see Section 15.11 for precise 
definition). 

This investigation is partly motivated by the analysis of a recent spectral clustering method for 
data sampled from multiple subspaces [H [5]. The goodness of clustering for this method depends 
on the averaged GCN within each cluster and the theory developed here interprets this dependence 
in terms of the d-dimensional LS errors within clusters. We also relate our study to some aspects 
of volume-based sampling for fast S VD [8] . 

Many of our techniques are rooted in the theory of uniform rectifiability [6]. In particular, 
notions similar to the ^-separation condition have appeared before for <i-regular or upper <i-regular 
measures (see Section [6] for their definitions) in [6j Lemma 5.8], |13|. Lemma 2.3], [221 Lemma 8.2] 
and |15l Proposition 3.1]. Moreover, differently scaled functions of d + 2 variables, referred to as 
discrete curvatures, were studied in p~8l [TH [TH [15] for d-regular measures. For example, while 
Md+i(X) is scaled by diam d (X) to produce the geometric condition number c vo i, it can be scaled 
differently to obtain the following discrete curvature: 

r (X) - M ^ X ) 

Cv ° l(X) -diam(^) 2 (X)- (5) 
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It follows from |14[ [T5] that for ci-regular measures the integral of C vo \ 2 is comparable to the Jones- 
type flatness, which adds up appropriately normalized (i-dimensional LS errors of certain balls of 
different radii centered at different locations. Another type of scaling of M<i+i (or more precisely, 
an equivalent variant of it) appeared in [21] for exploring different geometric properties of the 
underlying measure. 



1.1 Structure of This Paper and Additional Results 

In Section [2] we introduce notational conventions. In Section [3] we verify the existence of a LS 
<i-flat minimizing the error e2(/J-,d) and construct it in terms of the singular value decomposition 
of a special operator, which we refer to as the data-to- features operator. In Section 0] we introduce 
(i-dimensional GCNs of d+2 variables, in addition to c vo \. Section [5] controls e|(//, d) from above by 
integrals of these GCNs, whereas Section [6] bounds e|(/U, d) from below by these integrals and thus 
concludes the desired comparisons. In Section [7J we form d-dimensional GCNs of both d + 1 and d 
variables, and we establish their comparisons. We also relate there our work to that of Deshpande 
et al. [8] . Section [8] puts this work in a statistical context by relating our results to clustering 
afflne subspaces as well as extending some of the previous comparisons with high probability to 
the corresponding empirical quantities estimated from i.i.d. samples from fi. We discuss further 
implications and possible extensions in Section [9j 



2 Notational Conventions 
2.1 Comparisons 

For real- valued functions / and g, we let / ;$ g denote the existence of C > such that / < C ■ g. 
Similarly, / ~ g if / ^ g and <? ~ /■ The constants may depend on some arguments of / and g, 
which we indicate if they are unclear from the context. 



2.2 Simplices 

Fixing n E N, n > 2, we represent ra-simplices in H by ordered (n + l)-tuples of the product space, 
H n+ . We denote an element of H n+1 by X = (xq, . . . , x n ) and for < i < n: (X)i = xi denotes 
the projection of X onto its i th H- valued coordinate (or vertex). For < i < j < n, y, z E H and 
X E H n+1 as above, we form the following elements: 

X(i) = (x , . . . ,Xi-i,x i+1 , . . . ,x n ), (6) 

X(y, i) = (z , • • • , Xi-i,y, x i+1 , x n ), (7) 

The minimal edge length of X is denoted by min(X). We define the following quantities of X 
with respect to its zeroth coordinate xo: 

max XQ (X) = max — Xo\\ and mm XQ (X) = min \\xj — Xq\\. (8) 



For X such that min(X) ^ 0, let 



, , min rn (X) . , 
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We say that a simplex X is well-scaled at xq (for A > 0) if min(X) > and scale Xo (X) > A. 

We let L[X] denote the affine subspace of H of minimal dimension containing the vertices of 
X. We recall that for n G N, M n (X) is the n- volume of any of the parallelotopes generated by the 
vertices of X. We note that 

M n (X)=dist(x i ,L[X(i)])-M n - 1 (X(i)) for all < i < d+ 1. (10) 
3 Least Squares <i-Flats and Their Construction 

Formally, a LS (f-flat for \i is a d-flat L E AGd(H), for which the RHS of (pQ) obtains its minimal 
value. We show here that such d-flats exist, i.e., the function 

F(L) = J dist 2 (x, L) d/i(x) (11) 

obtains its minimum among all d-flats L in AGd(H). Moreover we show how to construct a LS d-flat 
given the singular value decomposition (SVD) of the data-to- features operator described next. 



3.1 The Data-to- Features Operator 

We define the center of mass of //, x cm , by 

and denote by i^O-O the se t of functions / : H 
features operator : H — > L2O-O is 



(12) 

R such that J \ f(x)\ 2 d/j>(x) < oo. The data-to- 



l(M) 



x d/i(x) 



(A^)(a;) = (y,x - x cm ) for all x,y £ H. (13) 

We use the name "data-to-features" operator since if (i is an atomic measure supported on N 
"data points" in H = K D , then is represented by an N x D matrix whose rows are the data 
points, shifted by their center of mass. Therefore, in this case A^ maps data points in W D into 
TV-dimensional feature vectors (containing coefficients according to the dictionary of shifted data 
points). We remark that the dependence of A^ on \i is not only due to the use of x cm , but also 
because the range of A^ is in L2 (/•*)• 

Next, we specify a kernel associated with A^ and use it to conclude that is Hilbert-Schmidt. 
Let us arbitrarily fix an orthonormal basis of H, {e n } n< =n, and express A^ as follows: 

(A fi y)(x) = ^(y,e n )(e n ,x - x cm ) for all x,y <E H. (14) 

We can thus view it as operator from £2 (with the counting measure //«) to -^(a 4 ) with the kernel 
k(x,n) = {e n ,x — x cm ). We note that this kernel is in ^(/^J x A 4 ), indeed, using the fact that the 
support of [i is bounded we obtain that 

/Y:|(e„,x-^)Pd, W =/l|x-x cm Pd,(x)<oc. (15) 
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We thus conclude that is Hilbert-Schmidt and in particular compact (see e.g., [12l Section 4]). 

Since is compact, we can apply its SVD (23j Section 1.6.2]. We denote the singular values of 
A^ repeated according to multiplicities by {(jj}j g N . Their corresponding right vectors are denoted 
by {vi}i £ N- Equivalently, these are the orthonormal eigenvectors of A^A^ (A* is the adjoint of A^) 
with eigenvalues {of}^. In Section [3T2l we apply the finiteness of ^ ieN of , which is equivalent to 
the Hilbert-Schmidt property of An- 

3.2 Least Squares d-Flats by SVD of the Data-to-Features Operator 

We use the SVD of A^ to construct a LS d-flat and express its corresponding error as follows: 
Proposition 3.1. A LS d-flat for [i exists and is obtained by 

x C m + Sp-j>i, . . -,v d }, 

where v±, . . . ,Vd are the top right vectors of the data-to-features operator A^. It is unique if and 
only if cr^ > <7 d+1 ■ Moreover, 



e 2 GM) = 4 /£°?- (16) 



i>d 



Proof. We express the function F(L) of (jlip in terms of a shift vector c G H, a linear subspace 
V C H and also in terms of the orthogonal projection of H onto the orthogonal complement of V, 
which we denote by Py . That is, 

F(L) = F(c,V) = J dist 2 (x,c + V)dfi(x) = J \\Py(x - c)|| 2 dfi(x). (17) 

We further note that 

F(c,V) = J \\P^(x-x cm )\\ 2 dfi(x) + \\P^(c-x cm )\\ 2 dfi(x). (18) 

We thus conclude that the vector c = x cm minimizes F(c, V) independently of V (more generally, 
the set of minimizers is x cm + V). 
We next note that 

J \\Py (x - x cm )\\ 2 dfi(x) = J Hx - x cm || 2 d/i(x) -max J \\P v (x - x cm )\\ 2 dfi(x) , (19) 

where Py is the projection operator of H onto V. Therefore, instead of minimizing F(x CIa , V), we 
maximize the function 

G(V) = J \\P v (x - x cm )\\ 2 dn(x) = trace(iVA* A M P^) . (20) 
The last equality in (|20p is evident due to the following expression of the adjoint operator A* : 

A*J = J(x - x cm )f(x) d/i(s) for all / G L 2 (p). (21) 



mm 

V 
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Indeed, if {e n } n=1 is an orthonormal basis of V and 1 < n < dim(V), then 

{e n ,P v A^A^P v e n ) = J {e n , x - x cm ) 2 dfi{x). (22) 

Thus, summing both the LHS and RHS over n = 1, . . . , dim(V) we obtain the desired equality. 

At last, we apply a theorem by Ky-Fan [S] (see also |1U| Theorem 3.5]) to conclude that the 
maximum of G is attained at V := Sp{«i, . . . , va}, where v\, . . . ,Vd are the top eigenvectors of 
A* and it is unique if and only if > er d+1 . That is, x cm + Sp{«i, . . . , v^} is a LS d-flat and 
unique whenever > (Jd+i- Furthermore, 

e|(/i, d) = minF(c, V) = trace(^4* — max trace(Py A* A„Py) = af . 

c,V V * — ' 

i>d 

□ 



4 Examples of Geometric Condition Numbers on H d+2 

In addition to the GCN c vo i defined in ([3]), we suggest four other GCNs of d + 2 variables. Two of 
these squared GCNs are also scaled versions of this volume. The first one has the form 

M d+1 (X) 
diam (fi) 

The second one uses the d-dimensional polar sine [16]. For < i < d + 1, the polar sine of 
X = (xq, • • • , Xd+i) with respect to the coordinate Xi is 



ifmin(X)>0; 



lIo<i<d+l \\Xj — Xi 



(24) 



0, otherwise. 
The corresponding polar GCN has the form: 



c pol (X) = diam(X) y ^ l= ° d + 2 X * { ■ (25) 

Another GCN is obtained by the d-dimensional LS error of the empirical measure associated 
with X as follows: 

<*,(*)= mm J ^ffM . ( 26 ) 
LeAGd(H) V d + 2 

At last, we form the minimal height GCN: 

c ht (X)= min dist( Xi ,L[X(i)]). (27) 

0<i<d+l 

We note that this GCN is practically comparable to an version of the £2 GCN, Cdi s - One can 
also form £ p versions of such GCNs for all 1 < p < 00, i.e., taking the p-th. root of the average of 
p-th powers of the distances. 
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The five GCNs on H d+2 of this paper satisfy a variety of pointwise comparisons. For example, 
via the product formula of (I10p . as well as |14|, eqs. (16), (118)] for arbitrary X we have that 

CyolA X ) < c voi(^) < cw(X) < { ' d+ J^ 2 ■ CdbPO- (28) 
Furthermore, from the definitions above we also have that 

c m \A X ) < ^oi(X) < c pol (X). (29) 

In order to control integrals of c po \ by integrals of Cdi s , we will use the following inequality of [HI 
Proposition 3.2]: 

diam(X) p d sm X0 (X) < V2 ■ (d + 1) • (d + 2)1 • — ^ — • c dls (X), (30) 
where scale Xo (X) was defined in ©. 



5 Upper Bounds on e^//, d) 
5.1 On d-Separated Measures 

The d-separated measures form the weakest class of probability measures for which we can bound 
e|(M) d) by integrals of squared GCNs. Let supp(/i) denote the support of \i and diam(^) denote the 
diameter of this support. We say that a d-simplex X = (xq, • • • ,Xd) G supp(/i) d+1 is (f-separated 
(for ui > 0) if 

M d (X) > U • diam(/i) d . (31) 

We say that the measure \i is (i-separated (with positive constants ui and e) if there exist sets 
Vi C supp(^x), < i < (i, that support (i-separated (i-simplices in the following way: 

1- > e for each < i < d. (32) 

d 

2. [J Vi C |x G supp(/x) d+1 : M d (X) > uj ■ diam(^) d } . (33) 
i=0 

The sets Vi can be taken to be balls but this is not necessary and can be too restrictive. 

We also say that fj, is (i-separated with respect to the center of mass of /i, x cm , or equivalently 
centrally d-separated, if there exist sets Vi C supp(/i), 1 < i < d, satisfying ([32]) for 1 < i < d as 
well as the following modification of (f33l) : 

d 

Y[ViX {x cm } C e supp( / u) d+1 : M d (X) > w • diam(^) d } . 

8=1 

The following lemma shows that (i-separation is a very general quantitative property in terms 
of information about t2^,d — 1). 
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Lemma 5.1.1. If /i is an arbitrary Borel probability measure on H, then the following statements 
are equivalent: 

1. fj, is d- separated. 

2. There exists a d-simplex X G supp(/i) d+1 such that M.d(X) > 0. 

3. e 2 (fi,d-l) > 0. 

4- /i is centrally d-separated. 

Proof. The equivalence of the first two statements of the lemma immediately follows from the 
continuity of and the following elementary observation (where B(x, r) is the closed ball centered 
at x of radius r): 

supp(^i) = {x G H : fi(B(x,r)) > for all r > 0}. 

To establish the equivalence of the second and third statements we first note that one direction 
is trivial. That is, e2(/u, d— 1) = implies that M.d(X) = for all X G supp(/i) d+1 , since all vertices 
will be trapped in a (d — l)-dimensional minimizing space L. 

The other direction, i.e., e2(fi,d — 1) > implies that Md(X) > for some X G supp(/x) d+1 , 
can be established by noting that for any affine space L of dimension lesser than or equal to d — 1 
we have that 

J dist 2 (x, L) d/j,(x) > e\(n, d — 1). 

Using this observation we can construct d points, xq, ■ ■ ■ , Xd-\ S supp(/x) such that for the (d — 1)- 
simplex X(d) = (xq, . . . ,Xd-i) we have that Md-i(X(d)) > 0. Since 

dist 2 (x,L[X(d)])dfi(x) > 0, 

we can select another point Xd G L[X(d)] c ^flsupp(^) (where L[X(d)] c is the complement of L[X(d)] c ) 
and taking X = X(xd,d) G supp(/i) d+1 we conclude that Md(X) > 0. 

The equivalence between the third and the fourth statements is proven in exactly the same way 
(recalling that x cm is contained in any LS cf-flat). □ 

5.2 The Main Theorem for Upper Bounds on e|(/x, d) 

Since all GCNs with d + 2 variables suggested here control c vo \ n (see ([28]) ) , we only need to bound 
e\(n, d) by an integral of c vo i iAt 2 . 

Theorem 5.1. If [i is d-separated for the positive constants uj and e, then 

4M<-2^iH I c 2 ml jX)d^\X). (34) 

Proof. We arbitrarily fix X{d + 1) = (xo, . . . ,Xd) G nf=o ^> where {Vi}f =0 are the sets of ([32]) 
and (]33j) defining the d-separated measure [i (with constants uj and e). It follows from both (|33j) 
and dJO]) that for X(d + 1) and X(y, d + 1) as in ® and (0), 

cl ol AX(y, d + l))= > - 2 • dist 2 (y, L[X(d + 1)]) (35) 
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and consequently 



1 



, , 4 ohfl (X(y,d + l))dp(y) > / dist z (y,L[X(d + l)])dp(y) >4(ji,d). (36) 
J H * Jh 

For < p < oo let 

c 2 vol jX)dp d+2 (X)\. (37) 



S p = fx(d+l) e n ^ : / c 2 vol jX(y,d+l))dp(y) < p f 

{ to Jh Jnd+2 



By Chebyshev's inequality we have 

,i 

Vi)- 

P 



Then, taking p > forces £ p ^ 0. Thus, restricting X(d + 1) to £ p for p > and combining 
equations (I36p and (J37]) , we conclude that the inequality of Theorem 15.11 holds with the controlling 
constant p/oo 2 . Since this holds for arbitrary such p we obtain the constant given in Theorem 15.11 

□ 

6 Lower Bounds for el(fi,d) 

We first verify a lower bound on e?,(p,d) by an integral of Cdi s 2 - Since the GCNs c vo \ jP , c vo \ and Cht 
are controlled by Cdi s (see ([29]) ). this bound also holds for all of these GCNs. 

Proposition 6.1. If p is an arbitrary Borel probability measure on H, then 

c 2 dls (X)dp d+2 (X)<e 2 2 (p,d). (38) 



/ 

JH d + 2 



Proof. For any fixed d-flat L £ AGd(H), by the definition of the GCN Cd\ s (X) and a subsequent 
application of Fubini's Theorem we obtain that 

/ c 2 ls (X)dp d+2 (X) <J-J2f dist 2 ((X),,L)d/+ 2 P0 = / dist 2 (x,L)d/i(x). 
JH d + 2 d + 2 J H d+2 J H 

The proposition is concluded by taking the infimum over all L € AGd(H). □ 

A lower bound on e|(/i, d) in terms of an integral of the GCN c po i 2 requires the following notions 
of regularity of p. For 7 > 0, we say that p is 7-regular if there exists aC>l such that 

ft 

— < p(B(x,t)) < C ■ f for all x G supp(/i), < t < diam(/x). (39) 

We say that p is 7- upper-regular if the upper bound of (|39p holds. We call the minimal such constant 
C satisfying (j39l) (or its right hand side for upper-regular measures) the regularity constant of p. 

Using these notions we formulate the following lower bound on e2(p,d) and verify it in the 
following section. 
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Theorem 6.1. If ^ satisfies either one of the following conditions: /i is 7 -upper-regular for 7 > 2 
or is ^-regular for 7 > 1 with d = 1, then 

[ c 2 ol (X)d/+ 2 (X)<e 2 (M), (40) 

JH d + 2 

where the comparison only depends on d, diam(/i) and the regularity constant of \i. 
6.1 Proof of Theorem 16.11 

The proof of this proposition is technically detailed, however it is based on few elementary ideas. 
It starts by replacing the integral of c po i 2 (X) with the integral of diam 2 (X) • pdsin 2 (X) by using 
a change of variables. Next, in view of (|30j) . diam(X) • prfsin x . Q (X) is controlled by Cdi s (^) for 
well-scaled simplices at xq (recalling that these are simplices for which the minimal edge length 
at xq is comparable to the maximal edge length at xq] see Section [2]). Therefore, by applying 
Proposition 16.11 the integral of diam 2 (X) • p^sin 2 (X) over well-scaled simplices is controlled by 

e|GM). 

The proof thus only requires the control of the integral of diam 2 (X) • p^sin 2 Q (X) on poorly 
scaled simplices in H d+2 (i.e., simplices which are not well-scaled). The idea follows the procedure 
of geometric multipoles |14l Section 9] , which uses a multiscale decomposition of the integral and 
finds local control according to the goodness of approximation by d-flats at different scales and 
locations. While in |14j we sought local control in terms of multiscale best fit d-flats, in the current 
work we seek local control in terms of a global best fit ci-flat. 

6.1.1 Preliminary Notation and Conventions 

For simplicity, we assume throughout the proof that diam(/i) = 1 and thus suppress estimates 
depending on diam(/z). 

For X G }{ d + 2 with xq = (^)o, we frequently refer to min(X), m.in Xo (X), m&Ti Xo (X) and 
scale xo (X) defined in Section [2l We decompose the set of simplices with non-zero edge lengths 
according to the following sets indexed by k,i G No: 

Si, k = [X = (x , • • . ,x d+1 ) G H d+2 : max^pO G (l/2 m , 1/2'] 

and scale^ (X) G (l/2 fc+1 , l/2 fe ] } . (41) 

We will mainly use their following subsets: 

S' ik = {X G S i)k : min^pf) = \\xi - x \\ and max^X) = \\x 2 - x \\} ■ 
For xo G H and £ G No we denote the annulus centered at xq and of "radius" 1/2^ by 

A(x , i) = {x G H : l/2 e+2 < \\x - x \\ < l/2 e }, (42) 
and we note that for all X G Si k and fixed 1 < j < d + 1 we have that 

(X)j G A(xq,£) for some i < I < i + k depending on X and j. (43) 
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6.1.2 Case I: /i Upper-Regular for 7 > 2 

We decompose the integral of c po \ by using the sets of (|4"Tj) and applying symmetry properties of 
the polar sine: 



/ c 2 ol (X)d/+ 2 (X) = / diam 2 (X) Pd s<(X)d/+ 2 (X) = 

J H d+2 JH d + 2 

00 00 „ 

E E / diam2 W PdSm 2 X0 (X) d//+ 2 (X) 
i=0 fc=0 S ' l > fe 



00 00 „ 

d-(d + l)EE/, diam 2 (X)p d sin 2 (X)d/ +2 (X). (44) 

i=0 k=0 S 'i,k 

The elements of the last double sum of (|44p that correspond to = can be controlled by com- 
bining ([30]) (where here scale^X) > 1/2) and Proposition 16. 1[ thus obtaining 

00 „ „ 

E/ diam 2 (X)p d sin 2 (X)d/ +2 (X) = / diam 2 (X) p d sin 2 (X) d/ +2 (X) 

<e 2 2 (p,d). (45) 

We now find sufficient bounds for the other terms in the last double sum (|44p to obtain con- 
vergence in i and k. Applying (|3U|) to a fixed term on the last double sum of (|44p we obtain the 
following bound for an arbitrary d-H&t L: 



[ diam 2 (X) ■ pdsin 2 (X) d/+ 2 (X) 
JS' 



" >•* ' I'"' '' ".I',. \ •» ' "/' \ ' ' ' 

i,k 

E/ ^ ^m<2"^f dist 2 (x„L)d/+ 2 (X). (46) 
. n JS'. u scale xn (A) n JS' 

We claim that for all < j < d + 1 : 

inf [ dist 2 (x ? -,L)d// +2 (X) < {l/2 k+i y{l/2 i y d -el{^,d). (47) 
L£AG d (H)J S > 

i,k 

To see this it is sufficient to integrate with respect to Xj last, and depending on the index j, to 
vary the order of integration of the other variables slightly. If j > then we take the integration 
with respect to xo as the second to last integration. If j > 1, then we integrate with respect to 
xi, then xo and then finally Xj. Following this procedure (|47j) clearly follows from the combination 
of (143ft with the upper-regularity. Indeed, the factor (l/2 fc+i ) 7 arises from the integration over the 
coordinate x\ if j 7^ 1 and xo if j = 1, (l/2 l ) 7 ' rf from the rest of coordinates excluding Xj and 
clearly e|(/i, d) from the coordinate Xj. 

Applying flU} to the RHS of ([35]). we obtain that 

/ diam 2 (X) • Pd sin 2 (X)d/+ 2 (X) < l/2^ 2 )- fc • 1/2^+^ • e^d). (48) 
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Finally, combining ([33]) , (|^5j) and we conclude that 

/ diam 2 (X) Pd sin 2 (X) d/+ 2 (X) = V V / diam 2 (X) p d sin 2 (X) d/+ 2 (X) 



< 



1 + E E V2 (7 ~ 2H ' • 1/2^+^ • e 2 (/i, d). 



\ 8=0 fc=l / 

Since the coefficient on the RHS above is clearly finite for 7 > 2, the proposition is thus proved for 
the current case. 

6.1.3 Case II: \x is 7-Regular for 7 > 1 and d = 1 

Since d = 1 we work with triangles X = (xq, x%,x*i) 6 -ff 3 and we need to prove that 

/ diam 2 (X)sin 2 (X)d^ 3 (X) <e 2 ( M ,l). (49) 

The procedure here is similar to that of Section I6.1.2[ however we must use an inequality for the 
sine function that holds with high probability for a 7-regular \jl with 7 > 1. We clarify this as 
follows. 

For fixed X = (xq,xi,X2) € H 3 , X(u, 1) as in (J7]), and C > 1, let 

U(X,C) :={u£H: | sm Xo (X) | < C ■ \ sm XQ (X(u, 1))|} , (50) 

and for a > let A a (X, C) denote the restriction of U(X, C) to an annulus: 

A a (X,C) := U(X,C) D B(x ,max X0 (X)) \ B(x ,a ■ max X0 (X)). (51) 

The following lemma shows that the defining inequality of (I50p occurs with high probability (we 
delay its proof to Section I6.1.4|) . 

Lemma 6.1.1. If fi is 7 -regular for 7 > 1 with regularity constant C^, and the constants Cq and 
ao are such that 

Co > \ • (4 • 5^ 2 • C 2 )^ 1 and < a < (4 • C 2 )" 1/7 , (52) 

then the following inequality holds uniformly for all X £ supp(^) 3 : 

fi(A ao (X, C )) > - -^^(^.max^ll)) . 

For the rest of this section we use the optimal values of the constants Cq and «o i n (|52p (i.e., 
the lower bound for Cq and upper bound for ao). We decompose all triangles with non-zero edge 
lengths into the sets 

S k = {X£H 3 : scale X0 (X) G (a k +1 , a^}} , 

for k > 1, and we denote 

S'j. = {I £ St : max^X) = ||x 2 - x \\} C 5fe. 
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By the symmetry of | sin Xo (X)| with respect to x\ and X2, we note that 
f diam 2 (X) • sin 2 (X) d^ 3 (X) = V f diam 2 (X) • sin 2 (X) d^ 3 (X) < 



k=0 ' 



oo . 

2 I diam 2 (X) • sin 2 (X) d/i 3 (X). (53) 

k=0 Js 'k 

We note that X G S' Q is well-scaled (for ao), and by combining ([30]) and Proposition 16.11 we see that 

/ diam 2 (X)-sin 2 (X)d M 3 (X) < e 2 (M). 
Js' 

We now use Lemma 16.1.11 to control the individual terms for k > 1 on the RHS of (|53p . We 
arbitrarily fix X G Si and define the probability measure 



• M (^ Q0 (X,C )) - 
We note that for any y G A ao (X, Co): 

diam 2 (X) • sin 2 (X) < C 2 • diam 2 (X(y, 1)) • sin 2 (X(y, 1)). 
and consequently for X(y, 1) as in ([7J, 

diam 2 pO • sin 2 (X) < C 2 / diam 2 (X(y, 1)) • sin 2 () (X(y, 1)) dfi x (y). (54) 

JA ao (X,C ) 

Since the triangle X(y, 1) is well-scaled for each y G A Q0 (X, Co), we can apply the inequality of ([30]) 
to the integrand on the RHS of (|54|) to obtain 

diam 2 (X) • sin 2 () (X) < dist 2 (x ,L) + / dist 2 (y, L) djlx{y) + dist 2 (x 2 , L) (55) 

JA ao (X,C ) 

for any L G AGi(-ff), where the constant of the inequality is independent of k. 
Fixing the line L, the middle term on the RHS of (155 1) trivially has the bound 



/ dist 2 (y,L)dll x (y) < 1 / dist 2 (x,L)d / u(x). (56) 

JA ao (X,C ) V{ A a {^,^o)) JH 

Thus, applying ([55]) to ([53]) . and then ([56]) to ([55]) . for an arbitrary line L we have 

f diam 2 (X) • sin 2 Q (X) d/i 3 (X) < 
J s' h 

dist 2 (x ,L)d^ 3 (X) + / dist 2 (x,L)d^(x)- f [ dist 2 (x 2 ,L)d^ 3 (X). 



(57) 
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We bound the terms of (|57|) separately. The first term satisfies 

/ dist 2 (x ,L)d^ 3 (X) < 
J St 



dist 2 (x ,L) / dfi(x{) dfi(x 2 ) d/x(x ) ~ 

H 2 \J B(xo,a$-\\x2-x \\) J 

fc'7 



«o 7 - / disr(x,L)d/i(x). (58) 



i3 



A similar computation gives the same bound for the third term. 

Then, by Lemma 16.1.11 and the regularity of [i we have that n (A ao (X, Co)) ~ max^X) 7 , and 
thus the second term of (1571) satisfies 



r^..^ f jgjO < 



dist (x, L) d/u(2; 



St M(A Q0 (X,C )) ~ 

dist 2 (x,L)dMx)- ( / „ d ^ {X \ 



< 



disr(s,L)d//(x). (59) 



Applying (j58|) and (j59|) to the terms of (157h we have the bound 

/ diam 2 (X) • sin 2 o (X) dfi 3 (X) < a k Q ~< ■ [ dist 2 (x, L) dfi(x). (60) 

Taking an infimum over L £ AGi(-ff) on the RHS of (|60p . and then summing this inequality over 
k > 1 we see that the proposition holds. 

6.1.4 Proof of Lemma 16.1.11 

Equation (|52"j) is a direct consequence of the following two equations: 

3 

/j,(B(x ,msx Xo (X)) \B(x ,a ■max Xo (X))) > - ■ n(B(x , max Xo (X)) (61) 

and 

»(U(X,C )) > j-^B(x ,max ao (X)). (62) 
The inequality of (|6ip follows from the 7-regularity of fi and the constant olq. Indeed, 
/j(B(x ,max X(] (I)) \ ,6(2:0, "0 • max^X))) > 

3 

fi(B(x ,max Xo (X))) -C^-o^ ■ max a . () (X) 7 > - • (i(B(x , max a . (A")). 
We conclude by proving (|U2"j) . We form the tube of radius max X0 (X) / Cq on the line L[X(1)], 
T ube (L[X(l)],maK X0 (X)/C ) = {y : dist(y, L[X(1)]) < mzx Xo (X)/C } , 
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and note that 

(T ubc (L[X(l)],max a;o (X)/Co)) c n J B(xo,max :!;o (X)) C U(X,C ). (63) 

Indeed, since | sin X0 (X(u, 1))| • \\u — xq\\ = dist(n, L[X(1)}) we have the following lower bound for 
any u G B(x ,max Xo (X)): 

\sm X0 (X(u,l))\-max X0 (X) > dist(«, L[X(1)]). (64) 

Applying (f64"|) to u G (T u b e (L[X (1)], max Xo (X) / Co)) c n B(xo,max Xo (X)), we obtain that 

C • | sin X0 (X(u, 1))| > 1 > | sio^X)!, 

i.e., it G C/(X, Co) and (j63j) is concluded. 
At last, we show that 

fj,(T uhe (L[X(l)],max xo (X)/C ) n B(xo,max xo (X))) < - ■ fi(B(x ,max X0 (X)) (65) 

and combining it with (|63h we establish (I62h . We first note that the intersection of the tube 
T u be(£[X(l)],maX:r (X)/Co) with B(xo,max Xo (X)) can be covered by at most 2-Co balls of radius 
• max X0 (AT)/(2 • Co) and thus 

fi (T ubc (L[X(l)],max X0 (X)/C ) D B{x ,max X0 {X))) < 

Cl ■ 5^ 2 ■ (2 • Co) 1 " 7 • fi(B(x ,max X0 (X)). (66) 

Equation (|65j) and consequently (j62]) follows by combining (f52l) with (l66j) . 

7 GCNs on and i7 d and their Corresponding Comparisons 

7.1 d-Dimensional GCNs of Only d+ 1 Variables 

If one knows a point that lies on a LS ti-flat, then any of the above GCNs can be reduced to a 
function of only d + 1 variables by arbitrarily fixing one of the original variables at that point. We 
exemplify this idea with the center of which lies on the LS d-flat (see Proposition 13. 1 j) 

and later explain how to extend it to other points. 

We consider the set of (d + l)-simplices with a fixed vertex at x cm , that is, we define the set 
H x +2 = {^cm} x H d+l , and we restrict our attention to the elements 

X = {x cm ,x u ...,x d+1 )e H d x ll • ( 67 ) 

As such, we can replace any GCN, c : H d+2 — > R, by c(X) : H d ^ — > M and establish the relevant 
comparisons as follows, where fi d+1 on is clearly on the set H d+1 . 

Proposition 7.1. If fi is centrally d-separated (foruj and e) then 

4M < -5^3 [ M cl ol jX)d/ +1 (X). (68) 
u z ■ e a M d+1 
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If on the other hand \i satisfies either one of the following conditions: [i is ^-upper-regular for 
7 > 2 or [i is ^-regular for 7 > 1 with d = 1, then 

f c 2 pol (X)d t i d+1 (X)<el( t ,,d), (69) 

where the comparison only depends on d, diam(/x) and the regularity constant of \x. 
Moreover, if [i is an arbitrary Borel probability measure on H , then 

[ c 2 dls (X)d/ +1 (X) rCe^d). (70) 



Proof. The proofs of (|68|) and (|69|) are identical to those of and (flOj) respectively, while they 
also use the fact that x cm lies in the LS d-flat. 

In order to prove ([70]) . we apply (f28|) and obtain the following for any fixed L E AGd(H): 

f cl s (X)d^Hx)< 

JT2 fe / d+1 dist 2 (x l ,L)d/+ 1 (X)+ / +i dist 2 (x cm ,L)d/ +1 (X)) . 

Since the function dist 2 (-, L) is convex for fixed L, we apply Jensen's Inequality to the last term 
on the RHS and then Fubini's Theorem to all terms and thus conclude (1701). □ 



We note that when using a fixed point y on the LS d-flat instead of x cm , then few modifications 
are needed. First of all, the minimizations defining both e2(/J>,d) and Cdis need to be restricted 
to subspaces in AGd(H) containing the point y. Also, d-separation needs to be defined w.r.t. y 
(instead of x cm ). When clustering d-dimensional linear subspaces, the LSCC algorithm (linear 
SCC) [5j 3] applies such a strategy with y = 0, which obviously lies on all linear subspaces. 

7.2 A d-Dimensional GCN of Only d Variables 

The work of Deshpande et al. [TJ [8] suggests a GCN on H d , which we denote by CDsh- The idea is 
to look at the geometry of d-simplices having the center of mass, x cm , as a fixed vertex. As such, 
we make the definition -£^ cm = {x cm } x H d , and we consider (i-simplices 

X = (x cm ,xi,...,x d ) G H d cin . 

We note that X £ -ff^ cm is simply the projection of some X G H d ^, i.e. X = X(d + 1). 
We define the square of CDsh(A') in the following way: 



Mj(X) J 6ist'(y,L[X\)diM(y) 

c Dsh(^) = ? j 

/ M 2 d (Y)dAY) 

where \x d on H d cin is clearly taken on the set H d . For a fixed X G H d cin , the GCN Cp sh (X) is simply 
the average squared volume of (d + l)-simplices having A" as a (i-dimensional face, divided by the 
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average squared volume of (i-simplices with the center of mass as a vertex. This follows directly 
from (Unj. 

The comparison of eK/i, d) and the integral of Cp sh is established as follows. 
Theorem 7.1. If [i is centrally d-separated with compact support, then 

e§(M, d) < -^— d I c 2 Dsh (X) d/(X). (71) 

If on the other hand \i is an arbitrary Borel probability measure on H , then 

cl sh (X)dfi d (X)<e 2 2 (f,,d). (72) 



J si 



Proof. In order to simplify the argument, we introduce a related GCN on (d + l)-simplices with a 
vertex fixed at x cm . That is we look at (d + l)-simplices per (|67p . i.e., X G fff^ 1 , and the GCN 
Cvoi,Dsh(-X")) whose square is defined by 



Mi. i ( x ) 

4 oWsh (X) = • (73) 

M 2 (Y)dfi d (Y) 



Hi. 



Per pop and Fubini's Theorem we see that the corresponding integrals of the two squared 
GCNs, c^ sh (X) and c^ olDsh (X), are equal, i.e., 

f cL h (X) d/(X) = f c 2 vo ^ sh (X) d/ +1 (X). (74) 

We will thus prove Theorem 17.11 with the simpler GCN c vo i i D s h(^)- Theorem 19.11 will immediately 
follow from our estimates below. 

We first note that (jTT|) follows from ([68}) and (fTij) . as well as the following fact: 

Cvol.Dsh 

(X) VX€H*£. 

In order to prove (|72p we generalize [3 Lemma 3.1] to our continuous setting by proving the identity: 
f d+ M 2 d+1 {X)d^\X)= Yl ( 75 ) 

" H:r.r-™ 1 ^ ^+ . . . 



l<t 1 <...<t d+ 

where {o"i}j G N are the singular values of the data-to- features operator A^. We obtain ([751) by 
expanding the expression det(I + XA^A*^ ) in A G 1 in two different ways and equating the corre- 
sponding coefficients. 

We first apply [TOl Theorem IV. 6.1] to obtain that 

det(/+AV4p = n(i+A^ 2 ) = i + ^r 4---4- A *- ( 76 ) 

jeN fceN ji<...<j fc eN 
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Next, in view of (f2Tj) we express the operator A^A*^ : L2(fi) — > ^(/-O as follows: 

(A l _ l A*^f)(y) = J (x - x cm , y - x cm )f(x) dfi(x), for all / G L 2 {n) and y £ H (77) 
so that it has the kernel 

k(y, x) = (x - x cm , y - x cm ). (78) 

By adapting the proof of |10[ Theorem VI. 1.1] to the operator XA^A*^ with the kernel Xk(y,x) and 
the compactly supported measure [i we obtain that 

det(J + XA^A*) = 1 + V — - / det({fc(xi, Xj)}? j=1 ) dn(xi) ■ ■ • M^ m ) = 

meN m - Jh™ 

l+^A m / M 2 m (Y)dr(Y), (79) 

where H™ = {x cm } x H rn . Equation ([75]) thus immediately follows from both ([76]) and ([79]) . 
We will also use the following immediate estimate: 

oo 

E 4-< +1 < E 4-<E4 ( 8 °) 

l<ti<-<t d+1 l<ti<...<t d j=d+l 



Now, combining ([16]) . ([75]) and ([80]) . we get that 

/ M2 +1 (X)d/ +1 (X)< / Mi(X)d/(X). e |GM), 



that is, 



/ ^ cL,DshWd/ +1 (X) < e|(M) 
and combining it with (j74[) we conclude (j72p and thus Theorem 17.11 □ 



8 Statistical Relevance of This Work 

8.1 Application to the Problem of Clustering Subspaces 

The identity of ([!]) is useful for clustering algorithms based on pairwise distances (see e.g., [3]). 
Similarly, the approximate identities of this paper are also useful for clustering algorithms based 
on higher-order correlations [TH [H [201 G3 111 12] • The latter algorithms are designed to cluster 
intersecting subspaces or manifolds, where the former algorithms fail. For example, the Spectral 
Curvature Clustering (SCC) Hj is an algorithm for clustering d-dimensional affine subspaces. It 
assigns to any d + 2 data points, x\, . . ., Xd+2, the affinity, e~ c p°^ Xl ''"' Xd+2 ^ 2cT , where c po i is the 
polar GCN and a is a positive tuning parameter that can be estimated from the data. It then 
organizes these affinities in a matrix whose spectral properties provide the clusters. We remark 
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that Cp i was referred to in 0] as curvature, instead of GCN, and this resulted in the algorithm's 
name SCC. 

The results of the current paper have been used to justify the SCC algorithm [I]. More precisely, 
[4] assumed data sampled from a mixture of subspaces corrupted by sufficiently small noise and 
showed that the underlying subspaces could be recovered with sufficiently large probability and 
small error. This error was controlled by two terms: a sum of within-clusters errors scaled by a 2 
(where a is the tuning parameter used to define the affinities) and between-clusters interaction. 
The control of the first term (involving within-clusters errors) was established by some of the theory 
proved here. This theory is simpler and more general than the one referred to in [U Section 2.3]. 



8.2 From Estimates in Expectation to Estimates in High Probability 

We extend the comparisons of the two expected quantities (i.e, LS error, which is the expectation of 
dist 2 (x,L), and the expectation of squared GCNs) to comparisons of their estimators obtained by 
i.i.d. samples from /i. That is, assume N -ff-valued i.i.d. random variables drawn from /i, denoted 
by Xi, . . . , Xn- We can estimate the LS error and any of the integrals of squared GCNs (assume 
for simplicity c 2 ls ) as follows: 

1 N 

ej(Xi,...,X N ;d) = — ^mjn Vdist 2 (Xi,L) (81) 

i=i 

and 

E ( 82 ) 

X=(X il ,...,Xi d+2 ) 
l<h,...,i d+ 2<N 

The following theorem shows that these two quantities are comparable to each other with high 
probability of sampling. 

Theorem 8.1. If \x is d- separated (for u ande), Xi, . . . ,Xn are N H -valued i.i.d. random variables 
drawn from n, then for any < 5 < 1 and 

K = — — ; ttt I cL(X) d/ +2 (X), 

(d+ 2) • diam(^) 2 J Hd+2 dlhV 1 P v 1 



the following estimate holds with probability 1 — 2 • e 



-2-N-k 2 



c dis(^i' ■ ■ ■ >%N;d) < e 2 (Xi, . . . ,XN]d) < + • — ^ — j- r • Cj ls (Xi, . . . ,X]y;d). (83) 



Moreover, for any < 5 < e the following estimate holds with probability 1 — (d + 1) ■ e 



-2N8 2 . 



c dis(-£i> • • • ,%N',d) < el(Xi, . . . ,XN;d) < ^2 T e _ ' c dis(^i' • • • >%N':d). (84) 



Proof. The LHS inequality of both (|83|) and (|84|) is proved identically to (|38j) and in fact is a 
deterministic inequality. 
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We first verify the RHS inequality of (|83p by estimating with probability the integral quantities 
by their discrete counterparts (via concentration inequalities). In order to estimate the integral 
of c^ ls by cjj ls (3Ei, . . . ,Xw,d) we fix 1 < i < N and note that the number of additive terms in 
c dis(-^i> • • • ' d) that contain jt» is (d + 2) • P(A r — 1, d + 1), where P(iV — 1, d + 1) denotes the 
permutations of d + 1 elements out of N — 1. Moreover, each of these terms is between and 
diam(/x) 2 / 'N d+2 . Consequently, 

SU P ^ |cd ls (Xi,...,Xi,...,XAr;d) -Cdi s (Xi,...,ij,...,XAr;d)| < (d + 2) -diam(/i) 2 /iV. (85) 
Si,...,Xjv,Xi 



Applying McDiarmid's inequality [19] with the underlying condition expressed in (J85J) we obtain 
that for any /3 > 0: 



(86) 



» N ( [ 4(1) d/ +2 (I) - ^(X, . . . ,X,; rf) > ^) < e -^ 2 /((^) 2 .d iamW ^ 
Setting 

= 5 [ 4(I)d/+ 2 (I), (87) 

we rewrite ([86]) as follows: 

H N (cl ls (X u ...,3L N ;d)<(l-5) [ ^)^(X)) < e - 2 ^/((^) 2 -dia m (^)_ (8g) 

In order to estimate e2(/W, d) by e2(3£i, . . . , %n; d) we note that 

1 * 

e!(£i,...,afo;d) < -^dist 2 (^,L), (89) 



i=i 



where L is a fixed LS <i-flat for /x. Applying Hoeffding's inequality to the function on the RHS 
of (|89p . we obtain that 



-2N/3 2 / diam(/i) 4 



By further use of (|38|) and ([87]) . we reduce ([90]) to the following probabilistic inequality: 



(90) 



The RHS inequality of QSg) thus follows from the combination of (JMD (|881) . (1891 and (j9T]l . 

Next, we prove the RHS inequality of (|84jl by showing that d-separation of /i is maintained 
(for a; and e', where e' < e) with overwhelming probability by i.i.d. random variables sampled from 
[i. We arbitrarily fix j = 1, . . . , d + 1 and i = 1, . . . , N and form the random variable 3ij by the 
formula: 

3ij(x) = Ix ie vAx) for all x G H, 
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where I is indicator function and {V^} -ij are the sets used in defining the d-separation of \x. We 
note that 



3ij(x) dfj,(x) = fi(Vj) > e. 
Combining this observation with Hoeffding's inequality we obtain that 



n N (- E %j/ n + e ^ ^ » N (- E + ^) ^ S J^ e ~ 2NS2 



Consequently, 



^ (n (e ^ > e - ^ > i - + 1) • e- 2 ^ 2 . 



That is, with probability 1 — (d + 1) • e~ 2NS the empirical measure ^tv(^4) = Yli=i^A{Xi) /N is 
d-separated for the parameters u and e — 5 and the same sets {Uj}jt.\, {Vj}^l . For each such 
instance of d-separation of the empirical measure, we apply Theorem 15. II to fj,jsf. That is, for a fixed 
sample X±, . . . , Xjy whose empirical measure is (i-separated with these sets and constants u and 
e — (5, we have the following inequality which is simply Theorem 15.11 applied to hn'- 

e|(^i, . . . ,Xn; d) < ^ 2 - - d+1 ■ c^siXx, . . . ,Xn] d). 

This inequality holds for all samples with probability 1 — (d + 1) ■ e~ 2N&2 and the RHS inequality 
of (EH is thus concluded. □ 



9 Discussion 



We presented examples of d-dimensional geometric condition numbers whose integrals are com- 
parable to the cf-dimensional least squares error for certain classes of measures. We related these 
results to the problem of clustering subspaces and to volume-based sampling for Monte-Carlo SVD. 
We discuss here further implications and open directions. 



9.1 Comparisons of L p Errors 

For simplicity we only discussed LS errors, i.e., L2 errors. Nevertheless, L p errors for 1 < p < 00 
can also be estimated using p-th powers of the GCNs. 



9.2 Approximate Identities for Singular Values 

Some of the approximate identities established in this paper can be translated to approximate 
identities involving singular values of certain operators. We exemplify this claim for the data-to- 
features operator as follows. 
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Theorem 9.1. If [x is centrally d-separated (for oj and e) with compact support and {<Ti}ieN are 
the singular values of the data-to- features operator, then 

00 V (T 2 • • • <T 2 00 

2 A ^l<ti<-<t d+1 u t 1 t d+1 ^ 2 / Q0 \ 

j=d+l Z_/l<t 1 <„.<i d ti t d j=d+l 

We note that the inequality on the RHS of (|92p is trivial for any set of numbers {crjjjgN- The 
LHS comparability is an immediate corollary of Theorem 17,11 (in view of (|73p -(|75 p ). 



9.3 More Robust Notion of d-Separation 

Our notion of d-separation is not sufficiently "robust to outliers" since it depends on diam(/i). 
Assume, e.g., a probability measure which is a mixture of one component supported in the unit 
ball and another component of an atomic measure supported on an arbitrarily far point with a 
sufficiently small weight. The diameter of this measure is mainly determined by the outlying 
atomic measure. However, for X £ H% and the GCN CDsh 

(X) (or X G H$ +1 and the GCN 
c V oi,Dsh(AT)) we can weaken the effect of outliers by replacing the condition M. d (X) > oj ■ diam(/i) rf 
with 

M 2 (X)>oj [ M 2 d (Y)dfi d (Y). (93) 



9.4 On d-Separation w.r.t. (d + 1)-Simplices and Its Implications 

A different notion of d-separation was previously used in the setting of d-regular measures on 
H [I3j [II]- It is based on (i-separation of (d + l)-simplices (instead of d-simplices). We adapt 
this notion to the current setting and explain its relation with (i-separation defined here, we also 
describe its implications. 

We say that a (d + l)-simplex X = (xq, ■ ■ ■ ,x d +) £ supp(/i) d+2 is d-separated (for oj) if all of 
its faces are d-separated as d-simplices (for oj). That is, 

min M d (X(i)) > oj ■ diam(^) d . (94) 

0<i<rf+l 

We say that \i is d-separated w.r.t. (d + l)-simplices (with positive constants oj, e and r) if there 
exist sets Vi C Ui C supp(/i), < i < d + 1, such that for each < i < d + 1: 

1. KVi) > e. 

2. dist M (Fi, Uf) := inf xGVi n S npp(^ Ik ~ y\\ > t ■ diam(^). 

2/€C/fnsupp(At) 

3- IlSo 1 UiQ{X€ supp( / u) d + 2 : mmo<i<<i+i M d (X(i)) > oj • diam(^) d } . 

In view of Lemma l5.1.1l and its proof d-separation is almost identical to d-separation w.r.t. (d + 
l)-simplices. The typical example of a d-separated measure which is not d-separated with respect 
to (d + l)-simplices is a measure supported on d + 1 atoms with positive d-volume. One can add 
another part of the support lying on a (d— l)-flat containing d of these atoms and provide this way 
additional examples. 
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Nevertheless, the extra care taken in defining ^-separation w.r.t. d-simplices is necessary in 
formulating the following stronger version of Theorem 15.11 which restricts the integral of (X) 
to the following set of simplices with sufficiently large edge lengths (with respect to r): 

LEr(p) = {l£ supp(//) d+2 : min(X) > r • diam(/i)| . 

Theorem 9.2. If fi is d-separated (for to, e and t) w.r.t. (d+ 1)- simplices, then 

u • e + V uj 2 -e J J LEtM 

The proof of this theorem follows the one of |15} Theorem 1.1]. This type of control was 
necessary in [13\ [T5] since singular curvature functions were used instead of GCNs and they had to 
be further integrated along various "scales" t w.r.t. the measure dt/t. Clearly, it is not necessary 
in the current context. 

9.5 Extension to Metric Spaces 

It will be interesting to extend some of our results to metric spaces. In particular, by choosing 
appropriate metric GCNs one can obtain a corresponding notion of an approximate best-fit sub- 
space. This task is considered in [17] for the purpose of clustering <i-dimensional smooth structures 
in metric spaces. 
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